Incremental Learning of Control Knowledge for Improvement of Planning Efficiency and Plan Quality

نویسندگان

  • Daniel Borrajo
  • Manuela Veloso
چکیده

General-purpose planners use domain-independent search heuristics to generate solutions for problems in a variety of different domains. However, as heuristics they are, there are situations in which these heuristics do not produce the expected effective guidance, and the planner performs inefficiently or obtains solutions of poor quality. Learning from experience can help to identify the particular situations for which the domain-independentheuristics need to be overridden. In this paper, we present a system, HAMLET, that learns control knowledge and incrementally refines it, allowing the planner not only to solve efficiently complex problems, but also generate solutions of good quality. We claim that incremental learning of control knowledge and consideration of the quality of the solutions are two fundamental research directions towards the goal of applying planning techniques to real-world problems. We show empirical results in a complex domain that show the promise of our approach to support our claims. Introduction and Related Work Most systems that learn strategic knowledge in problem solving have been applied to problem solvers with the linearity assumption, such as the ones applied to Prolog or logic programming (Quinlan 1990; Zelle & Mooney 1993), special-purpose (Langley 1983; Mitchell, Utgoff, & Banerji 1983), or other general-purpose linear problem solvers (Etzioni 1993; Leckie & Zukerman 1991; Minton 1988; Pérez & Etzioni 1992). These problem solvers are known to be incomplete and unable of finding optimal solutions (Rich 1983; Veloso 1989). If we remove the linearity assumption, we are dealing with nonlinear problem solvers. This kind of probThis research is sponsored by the Wright Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF, and the Advanced Research Projects Agency (ARPA) under grant number F33615-93-1-1330. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U.S. Government. This work was initiated while Borrajo was at Carnegie Mellon University on leave from the Universidad Politécnica de Madrid supported by grants from the Ministerio de Educación y Ciencia and from Comunidad de Madrid. lem solvers are needed to address real world complex problems. Some nonlinear planners search the plan space by using partially-ordered plans (Chapman 1987; McAllester & Rosenblitt 1991). Others remove the linearity assumption by fully interleaving goals, searching in the state space, and using totally-ordered plans (Veloso 1989; Warren 1974). Most of the approaches do backward chaining, although some use forward chaining (Bhatnagar 1992; Laird, Rosenbloom, & Newell 1986). In general, there have been only a few learning approaches applied to nonlinear problem solving (Bhatnagar 1992; Kambhampati & Kedar 1991; Laird, Rosenbloom, & Newell 1986; Pérez & Carbonell 1994; Ruby & Kibler 1992; Veloso 1992). In this paper we show that nonlinear problem solving offers new learning opportunities where domain-dependent control knowledge may be used to further improve not only the problem solver performance but also the quality of the solutions produced. Both issues are needed for scaling up the kinds of domains and problems that planners can solve. Constructing correct explanations of the nonlinear problem solver successes and failures from a single example may be computationally very expensive, as the generalization phase to generate provably correct knowledge would have to consider a large number of possible combinations of planning situations. To alleviate this effort, we developed a new approach, and implemented it in HAMLET,1 where control knowledge for individual decisions is incrementally acquired through experience. HAMLET is integrated with PRODIGY4.0, the current nonlinear problem solver of the PRODIGY architecture for planning and learning (Carbonell et al. 1992). HAMLET learns local control rules by first lazily explaining the decisions made during problem solving from individual examples. Upon finding new positive and negative examples of the use of its learned rules, HAMLET incrementally induces and refines its control knowledge (Borrajo & Veloso 1993; 1994a). A similar lazy approach can be found in (Tadepalli 1989), where LEBL (Lazy Explanation Based Learning) is presented. The main difference is that while Tadepalli refines the knowledge introducing exceptions, HAMLET mod“HAMLET” stands for Heuristics Acquisition Method by Learning from sEarch Trees. ifies the control rules themselves adding or removing their applicability conditions. Also, Tadepalli applies it to game playing, while we use it for general task planning. While improving problem solving performance has been largely studied, learning to improve solution quality has only been recently pursued by some researchers, including (Pérez & Carbonell 1994; Ruby & Kibler 1992). We differ from and Pérez’s work in the fact that HAMLET performs inductive refinement of the control rules, and in the way positiveexamples are generated. Ruby and Kibler’s approach differs in the knowledge representation of the learned control knowledge, since it is a case-based learner. HAMLET combines the two kinds of optimization, by learning control rules, that allow not only to do more effective search, but also to achieve better solutions. In the paper, we discuss what are learning opportunities for problem solving, and how we extended previous EBL work. We present HAMLET describing the main features of its deductive, inductive, and refinement modules. Finally, the paper shows empirical results on this work and draws conclusions. Learning Opportunities In order to efficiently solve problems in real world applications, general-purpose problem solvers must use efficient domain-independent heuristics to improve its search performance. In this section we show that additional domaindependent control knowledge may be used to further improve not only the problem solver performance but also the quality of the solutions produced. To illustrate our points, we use PRODIGY4.0, but the reasons we highlight that make learning needed can, in general, improve any problem solver’s performance. The current nonlinear problem solver in PRODIGY, PRODIGY4.0, follows a means-ends analysis backward chaining search procedure reasoning about multiple goals and multiple alternative operators relevant to the goals. PRODIGY4.0 is a successor of the previous linear PRODIGY2.0 (Minton et al. 1989) and the first nonlinear and complete NOLIMIT (Veloso 1989). The inputs to the basic problem solver algorithm are the set of operators specifying the domain knowledge, and a problem specified in terms of an initial configuration of the world, and a set of goals to be achieved. Table 1 shows the skeleton of PRODIGY4.0’s planning algorithm. The planning reasoning cycle involves several decision points, namely: the goal to select from the set of pending goals and subgoals; the operator to choose to achieve a particular goal; the bindings to choose in order to instantiate the chosen operator; apply an operator whose preconditions are satisfied or continue subgoaling on a still unachieved goal. Default decisions at all these choices can be directed by explicit control knowledge. Although PRODIGY can use a variety of powerful domainindependent heuristics (Stone, Veloso, & Blythe 1994), it is very difficult and costly to determine in general which of these heuristics are going to succeed or fail. Therefore, learning can be used for automatically acquiring control 1. Terminate if the goal statement is satisfied in the current state. 2. Compute the set of pending goals G, and the set of applicable operators A. A goal is pending if it is a precondition, not satisfied in the current state, of an operator selected to be in the plan to achieve a particular goal. An operator is applicable when all its preconditions are satisfied in the state. 3. Choose a goalG from G or select an operator A from A. 4. If G has been chosen, then Expand goal G, i.e., get the set O of relevant instantiated operators that could achieve the goalG, Choose an operator O from O, Go to step 1. 5. If an operatorA has been selected as directly applicable, then Apply A, Go to step 1. Table 1: A skeleton of PRODIGY4.0’s planning algorithm and choice points. knowledge to override the default behavior of a particular domain-independent search heuristic to drive the planner more efficiently to a solution. Note that this need to learn when particular domain-independent search strategies do not produce desirable results is common to any planner (Veloso & Blythe 1994). Another reason for the need of learning relates to the issue of optimality of the solutions obtained by the problem solver. Our current measure of optimality is the length of the solution.2 There are many domains in which the necessary knowledge to select the optimal solution is not explicit in the definition of the domain knowledge or it is costly to do a breadth-first search for finding it. In those cases, learned control knowledge can direct the search to those optimal solutions, as also pointed out by (Pérez & Carbonell 1994). Extending Previous Work HAMLET extends the EBL methods used with the linear planning algorithm of PRODIGY2.0 (Etzioni 1993; Minton 1988; Pérez & Etzioni 1992) to apply to the nonlinear PRODIGY4.0. This extension is needed along several aspects in order to address new problems raised by the decisions on multiple goal interleaving choices combined with multiple operator and binding choices. We identify new learning opportunities related to PRODIGY’s ability of postponing planning commitments for efficiency and quality purposes. We introduce new language primitives for describing the learned control rules, in order to capture the information related to these new choices. HAMLET reduces the explanation effort by generating partial (“bounded”) explanations of branching decisions made during the search for a solution. HAMLET’s inductive learning module assures the incremental correctness of every Our method is not dependent of this particular metric, and can use any operational optimality measure. deduced control rule. The use of inductive learning eliminates the need for an axiomatic domain theory to support the correct generalization of an episodic explanation, as required in EBL applied to PRODIGY2.0. HAMLET’s Architecture The inputs to HAMLET are a domain specified as a set of planning operators, a set of training problems, and a quality measure. The output is a set of control rules. HAMLET has three main modules: Bounded-Explanation, Induction, and Refinement. The Bounded-Explanation module generates control rules from a PRODIGY search tree. These rules might be over-specific or over-general. The Induction module addresses the problem of over-specificity by generalizing rules when analyzing positive examples. The Refinement module replaces over-general rules with more specific ones when it finds situations in which the learned rules lead to wrong decisions. HAMLET gradually learns and refines control rules converging to a concise set of correct control rules, i.e. rules that are individually neither over-general, nor over-specific.3 Figure 1(a) shows HAMLET’s modules and their connection to PRODIGY, and Figure 1(b) presents an outline of HAMLET’s algorithm. Here ST and ST0 are search trees generated by the PRODIGY planning algorithm, L is the set of control rules, L0 is the set of new control rules learned by the Bounded Explanation module, and L00 is the set of rules induced from L and L0 by the Inductive module. We describe next the main features of the three modules of HAMLET.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thesis Summary: Learning Search-Control Knowledge to Improve Plan Quality

Generating good, production-qualityplans is an essential element in transforming planners from research tools into real-world applications, but one that has been frequently overlooked in research on machine learning for planning systems. Most work has been aimed at improving the efficiency of planning (“speed-up learning”) or at acquiring or refining domain knowledge. This thesis focuses on lea...

متن کامل

Control Knowledge to Improve Plan Quality

Generating production-quality plans is an essential element in transforming planners from research tools into real-world applications. However most of the work to date on learning planning control knowledge has been aimed at improving the efficiency of planning; this work has been termed “speed-up learning”. This paper focuses on learning control knowledge to guide a planner towards better solu...

متن کامل

Learning to Improve both Efficiency and Quality of Planning

Most research in learning for planning has concentrated on efficiency gains. Another important goal is improving the quality of final plans. Learning to improve plan quality has been examined by a few researchers, however, l i t t le research has been done learning to improve both efficiency and quality. This paper explores this problem by using the SCOPE learning system to acquire control know...

متن کامل

Clustering of nasopharyngeal carcinoma intensity modulated radiation therapy plans based on k-means algorithm and geometrical features

Background: The design of intensity modulated radiation therapy (IMRT) plans is difficult and time-consuming. The retrieval of similar IMRT plans from the IMRT plan dataset can effectively improve the quality and efficiency of IMRT plans and automate the design of IMRT planning. However, the large IMRT plans datasets will bring inefficient retrieval result. Materials and Methods: An intensity-m...

متن کامل

The Goal is to Produce Better Plans

The purpose of a learner is to make changes in a performance system to do similar tasks more effectively the next time. The meaning of “more effectively” can be only asserted in the context of the learner’s goals. Three types of learning goals can be distinguished in the context of planning systems: domain goals, planning efficiency goals, and plan efficiency or plan quality goals. Most work to...

متن کامل

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994